Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to google.golang.org/grpc v1.66.2 / modify certain protobuf messages to retain their unmarshaling buffer #9401

Merged
merged 16 commits into from
Oct 22, 2024

Conversation

aknuds1
Copy link
Contributor

@aknuds1 aknuds1 commented Sep 25, 2024

What this PR does

Upgrade google.golang.org/grpc from v1.65.0 to v1.66.2. The performance regression referenced from dskit's reversal to v1.65.0 is fixed in v1.66.2, and the potential functional regression also referenced in said reversal is Loki specific.

What we found out is that v1.66 fixes a serious performance bottleneck in conjunction with compression; the decompress function is optimized through memory pooling, whereas v1.65.0 would use io.ReadAll and cause a lot of allocations during decompression. As a result, we saw ingester CPU usage increase by ~40% when enabling gRPC compression. The increase is now much more modest, maybe ~7% based on observations.

In order to keep allowing for unsafe references to unmarshalling buffers in e.g. LabelAdapter, I had to introduce a custom gRPC unmarshalling hook (mimirpb.CodecV2.Unmarshal) plus a scheme for protobuf messages to keep a reference to their unmarshalling buffer (mimirpb.UnmarshalerV2). The underlying reason is that from v1.66 on, gRPC immediately recycles each buffer after unmarshalling, unless a reference is kept. This causes data races with e.g. LabelAdapter taking unsafe string references to the unmarshal buffer, unless, as mentioned, a buffer reference is kept with the root protobuf message.

Ideally, one should call FreeBuffer on protobuf messages keeping an unmarshalling buffer reference so the buffer can be given back to the gRPC pool. If this isn't done though, there also shouldn't be a memory leak since the buffer should be garbage collected.

Which issue(s) this PR fixes or relates to

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@aknuds1 aknuds1 marked this pull request as ready for review September 25, 2024 07:54
@aknuds1 aknuds1 requested review from stevesg, grafanabot and a team as code owners September 25, 2024 07:54
@aknuds1 aknuds1 marked this pull request as draft September 25, 2024 07:54
@aknuds1 aknuds1 marked this pull request as ready for review September 25, 2024 07:56
@aknuds1
Copy link
Contributor Author

aknuds1 commented Sep 25, 2024

Some tests are breaking for this upgrade, have to fix them.

@aknuds1 aknuds1 marked this pull request as draft September 25, 2024 08:17
@aknuds1 aknuds1 changed the title Upgrade to google.golang.org/grpc v1.66.2 WIP: Upgrade to google.golang.org/grpc v1.66.2 Sep 25, 2024
@aknuds1
Copy link
Contributor Author

aknuds1 commented Sep 25, 2024

Tests currently pass, but that is through undoing some memory unsafe optimizations in the mimirpb package. We want to investigate whether the optimizations can safely be retained.

Edit: Memory unsafe optimizations are back again :) Solved by letting relevant protobuf messages keep an unmarshalling buffer reference.

@aknuds1 aknuds1 force-pushed the arve/upgrade-grpc branch 16 times, most recently from d79ea9f to f484e3a Compare September 30, 2024 11:45
@aknuds1 aknuds1 changed the title Upgrade to google.golang.org/grpc v1.66.2 Upgrade to google.golang.org/grpc v1.66.2 / modify certain protobuf messages to retain their unmarshaling buffer Oct 18, 2024
@aknuds1 aknuds1 requested a review from colega October 21, 2024 12:03
pkg/distributor/query.go Outdated Show resolved Hide resolved
Signed-off-by: Arve Knudsen <[email protected]>
Copy link
Contributor

@colega colega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for addressing all my comments.

@aknuds1 aknuds1 merged commit accccf4 into main Oct 22, 2024
29 checks passed
@aknuds1 aknuds1 deleted the arve/upgrade-grpc branch October 22, 2024 07:40
colega added a commit that referenced this pull request Oct 23, 2024
Follow up on #9401

While the write path has been carefully crafted to make sure that we
don't keep references to the strings backed by arrays with data from
grpc stream (because it would cause a memory leak), this doesn't seem to
be the case in the query path, where some tests started failing after we
started to reuse those backing arrays through the usage of memory
buffers implemented in the new gRPC library.

This change reverts the buffer freeing, means that it won't be recycled
(and will be garbage collected) to ensure data correctness, while we
investigate where the data references are kept.

Signed-off-by: Oleg Zaytsev <[email protected]>
colega added a commit that referenced this pull request Oct 23, 2024
Follow up on #9401

While the write path has been carefully crafted to make sure that we
don't keep references to the strings backed by arrays with data from
grpc stream (because it would cause a memory leak), this doesn't seem to
be the case in the query path, where some tests started failing after we
started to reuse those backing arrays through the usage of memory
buffers implemented in the new gRPC library.

This change reverts the buffer freeing, means that it won't be recycled
(and will be garbage collected) to ensure data correctness, while we
investigate where the data references are kept.

Signed-off-by: Oleg Zaytsev <[email protected]>
duricanikolic added a commit that referenced this pull request Nov 1, 2024
…otobuf messages to retain their unmarshaling buffer (#9401)"

This reverts commit accccf4.
leizor added a commit that referenced this pull request Nov 1, 2024
…otobuf messages to retain their unmarshaling buffer (#9401)"

This reverts commit accccf4.
duricanikolic added a commit that referenced this pull request Nov 4, 2024
…otobuf messages to retain their unmarshaling buffer (#9401)

Signed-off-by: Yuri Nikolic <[email protected]>
duricanikolic added a commit that referenced this pull request Nov 4, 2024
* Revert "Don't free buffers after reading query stream (#9721)"

This reverts commit f7b6017.

* Revert "Upgrade to google.golang.org/grpc v1.66.2 / modify certain protobuf messages to retain their unmarshaling buffer (#9401)"

This reverts commit accccf4.
duricanikolic added a commit that referenced this pull request Nov 4, 2024
* Revert "Don't free buffers after reading query stream (#9721)"

This reverts commit f7b6017.

* Revert: Upgrade to google.golang.org/grpc v1.66.2 / modify certain protobuf messages to retain their unmarshaling buffer (#9401)

Signed-off-by: Yuri Nikolic <[email protected]>

* Revert "Distributor.queryIngesterStream: Free gRPC buffers upon error (#9756)"

This reverts commit eda1a4b.

---------

Signed-off-by: Yuri Nikolic <[email protected]>
aknuds1 added a commit that referenced this pull request Nov 4, 2024
* [release-2.14] fix(deps): update github.com/thanos-io/objstore digest to f90c89a (main) (#9625)

* Update github.com/thanos-io/objstore digest to f90c89a (#9534)

Signed-off-by: Arve Knudsen <[email protected]>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
(cherry picked from commit 3c97a61)

* update changelog

Signed-off-by: Vladimir Varankin <[email protected]>

---------

Signed-off-by: Vladimir Varankin <[email protected]>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Vladimir Varankin <[email protected]>

* 2.14.1 Prepare patch release (#9796)

* bump patch version

Signed-off-by: Vladimir Varankin <[email protected]>

* rebuild assets

Signed-off-by: Vladimir Varankin <[email protected]>

---------

Signed-off-by: Vladimir Varankin <[email protected]>

* MQE: Fix handling of string results (#9803)

This would previously panic after the query was closed

* chore(deps): update grafana/mimirtool docker tag to v2.14.1 (#9806)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Revert upgrade to google.golang.org/grpc v1.66.2 (#9811)

* Revert "Don't free buffers after reading query stream (#9721)"

This reverts commit f7b6017.

* Revert: Upgrade to google.golang.org/grpc v1.66.2 / modify certain protobuf messages to retain their unmarshaling buffer (#9401)

Signed-off-by: Yuri Nikolic <[email protected]>

* Revert "Distributor.queryIngesterStream: Free gRPC buffers upon error (#9756)"

This reverts commit eda1a4b.

---------

Signed-off-by: Yuri Nikolic <[email protected]>

* Improve lock contention affecting read and write latencies during TSDB head compaction (cherry-pick Prometheus PR 15242) (#9822)

* Cherry-pick Prometheus PR 15242

Signed-off-by: Marco Pracucci <[email protected]>

* Added CHANGELOG entry

Signed-off-by: Marco Pracucci <[email protected]>

* Updated CHANGELOG

Signed-off-by: Marco Pracucci <[email protected]>

---------

Signed-off-by: Marco Pracucci <[email protected]>

---------

Signed-off-by: Vladimir Varankin <[email protected]>
Signed-off-by: Yuri Nikolic <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Co-authored-by: Grot (@grafanabot) <[email protected]>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Vladimir Varankin <[email protected]>
Co-authored-by: Joshua Hesketh <[email protected]>
Co-authored-by: Đurica Yuri Nikolić <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants